[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231

pawosm-arm · 2025-04-10T18:42:40Z

This is to backport the fix for #126836 to the release/20.x branch. Tested on the real thing.

…lvm#130048) From what I understand, we only create VPReductionRecipes for in-loop reductions, and we don't currently support in-loop AnyOf reductions. We only create VPReductionRecipes in the !PhiR->isInLoop() section of adjustRecipesForReductions, and this comment from the initial patch seems to confirm this https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we can remove this check in the condition logic. I checked compiling SPEC 2017 with -prefer-inloop-predicates and the added assertion doesn't trigger.

) Currently fast() won't return true if all flags are set via setXXX, which is surprising. Update setters to set all bits if needed to make sure isFast() consistently returns the expected result. PR: llvm#131321

This is split off from llvm#131300. A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so when it calls createReduction it always calls createSimpleReduction. If we replace the call then it leaves createReduction with one user in VPInstruction::ComputeReductionResult, which we can inline and then remove.

This patch change the parent of the VPReductionRecipe from VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control flags by the VPRecipeWithIRFlags. This will remove the dependency of the underlying instruction. This patch also add a new function `setFastMathFlags()` to the VPRecipeWithIRFlags because the entire reduction chain may contains multiple instructions. And the underlying instruction may not contains the corresponding flags for this reduction. Split from llvm#113903.

…pUtils. NFC (llvm#132014) Split off from llvm#131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.

This doesn't rely on State.CFG.

createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.

Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.

Update code to use VPBuilder, to simplify follow-up changes.

…CI (llvm#131300) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.

Splits off reduction printing tests, to limit growth and add test case for printing find-last-IV (llvm#132689)

This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix llvm#126836. PR: llvm#132689

llvm#132690) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on llvm#132689 PR: llvm#132690

Add missing test coverage for llvm#126836.

This adds missing test coverage for llvm#132691.

FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on llvm#132689 and llvm#132690. Fixes llvm#126836 PR: llvm#132691

…nt commits

nikic

Breaks LoopUtils.h ABI in obvious ways and FMF.h ABI in less obvious ways.

Can this be fixed in a more minimal way than backporting 18 commits that include a lot of refactorings?

fhahn · 2025-04-14T19:52:42Z

I'm not sure if it is feasible to strip the fix down, as it depends quite a few refactoring patches.

For 20.x, it might be best just not enable epilogue vectorization for FindLastIV: #135666

pawosm-arm · 2025-04-14T20:35:38Z

Thanks @fhahn for your comment and your patch. It is a good reason for closing down this one.

lukel97 and others added 18 commits April 10, 2025 18:28

[FMF] Set all bits if needed when setting individual flags. (llvm#131321

11ad62c

) Currently fast() won't return true if all flags are set via setXXX, which is surprising. Update setters to set all bits if needed to make sure isFast() consistently returns the expected result. PR: llvm#131321

[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in Loo…

118d349

…pUtils. NFC (llvm#132014) Split off from llvm#131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.

[VPlan] Get DataLayout from SE in VPExpandSCEVRecipe::execute (NFC)

c5dbd00

This doesn't rely on State.CFG.

[VPlan] Reautogenerate float-induction.ll after last commit

1f6be69

[LV] Use VPBuilder to create ComputeReductionResult. (NFC)

8dd520e

Update code to use VPBuilder, to simplify follow-up changes.

[VPlan] Split off reduction printing tests, add find-last-IV test.

e4598e0

Splits off reduction printing tests, to limit growth and add test case for printing find-last-IV (llvm#132689)

[VPlan] Add ComputeFindLastIVResult opcode (NFC). (llvm#132689)

3675c5f

This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix llvm#126836. PR: llvm#132689

[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (

07699e3

llvm#132690) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on llvm#132689 PR: llvm#132690

[LV] Add epilogue vectorization tests for FindLastIV reductions.

2493281

Add missing test coverage for llvm#126836.

[LV] Add FindLastIV test with truncated IV and epilogue vectorization.

e8d8800

This adds missing test coverage for llvm#132691.

[LV] Reautogenerate epilog-iv-select-cmp.ll test files after the rece…

edcf3b5

…nt commits

pawosm-arm mentioned this pull request Apr 10, 2025

[Downstream change][LV] Backported patchset with the fix for the WRF issue arm/arm-toolchain#264

Closed

kiranchandramohan requested review from Mel-Chen, david-arm, fhahn and yus3710-fj April 10, 2025 19:05

nikic added this to the LLVM 20.X Release milestone Apr 13, 2025

github-project-automation bot added this to LLVM Release Status Apr 13, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Apr 13, 2025

nikic requested changes Apr 13, 2025

View reviewed changes

tstellar moved this from Needs Triage to Needs Fix in LLVM Release Status Apr 14, 2025

pawosm-arm closed this Apr 14, 2025

nikic moved this from Needs Fix to Done in LLVM Release Status Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231

[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231

Uh oh!

pawosm-arm commented Apr 10, 2025 •

edited

Loading

Uh oh!

nikic left a comment

Uh oh!

fhahn commented Apr 14, 2025

Uh oh!

pawosm-arm commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231

[LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) #135231

Uh oh!

Conversation

pawosm-arm commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn commented Apr 14, 2025

Uh oh!

pawosm-arm commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pawosm-arm commented Apr 10, 2025 •

edited

Loading